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A VIDEO DECODING DEVICE , ^^CNDON 



Field of the Invention 



The present invention relates to a video decoding device and 
in particular but not exclusively to a video decoding device 
for decoding Moving Picture Experts Group (MPEG) and 
associated encoded video types. 



Background of the Invention 

Digital video compression is used in many different 
environments and has become the first choice in many different 
situations. Three such environments are: broadcasting and the 
DVB (Digital Video Broadcasting) standard; storage of video 
data and the DVD (Digital Versatile Disc) standards; and 
internet node to node communication using standards such as 
MPEG-4 encoding. 

Traditionally most of these coders and decoders (Codecs) 
associated with these are based on the original MPEG (Moving 
Picture Experts Group - the international standards 
organisation/international electro-technical commission video 
decoding and encoding methods) standards, commonly known as 
MPEG2 . Codecs and the coding/decoding algorithms powering them 
have improved since the original algorithms were set as the 
standards. These improvements have been incorporated into a 
series of Codecs which whilst not being MPEG standards still 
use MPEG type segmentation and compression steps. 

MPEG and MPEG type codecs treat video data which consists of a 
series of images - these images are known as pictures. There 
are two types of picture, a frame picture or a field picture. 
A frame picture is further subdivided into a pair of fields, 



each field comprising alternate rows of a scanned image. A 
field picture contains the rows of image data from a single 
field. Each picture can be treated by the codec as a reference 
picture or as a non reference picture. 

5 

All of these MPEG type schemes encode and decode with 
reference to the frame or field pictures and in particular to 
a specific reference or multiple reference pictures. In its 
most basic form this involves encoding an initial reference 

10 picture followed by a series of non-reference pictures. In 
practice however the distinction between reference and non- 
reference pictures is not as simple. Reference pictures can be 
both intra (I) pictures and/or P pictures. An I picture is a 
picture which is encoded without reference to any other 

15 picture. Both P and B pictures are encoded however using data 
from one or more other pictures. B pictures are not used to 
encode other pictures. 

The I pictures are broken down into a series of macro blocks 
20 (MB) . The blocks are then processed. The B and P pictures 
search the macro blocks from the P and I pictures to find 
similar blocks to those in the B and P pictures. The B and P 
pictures are then encoded in terms of macro blocks which 
differ from the macro blocks of the I and P pictures in terms 
25 of content and also position. 

Decoding schemes perform the reverse of the encoding scheme 
that is recreating the original P and I pictures, and then 
using the I and. P pictures and. extra P and. B picturre data to 
30 reconstruct the. original P and B pictures - 

Currant cc!ininsrciall.y usable video, decoding chip solutions, hassr 
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optimised and are not easily capable of being modified to 
decode video streams encoded using any other standard. 

Known flexible decoders are available but have been based on 
the general processing unit (GPU) such as those used within 
personal computers. These flexible decoders perform the whole 
decoding process using instructions stored in memory external 
to the processor, which is selected and ioaded dependent on 
the video standard required to be decoded. These CPU's are not 
optimised to perform video decoding, consume a large amount of 
power and produce a large amount of heat when compared against 
the previous MPEG2 video processing units used. The GPU 
solution is also expensive compared against the MPEG2 video 
processing units previously used and therefore making the cost 
of production of. a flexible video decoding system expensive. 

* 

• ♦ 

Summary of the Invention 

• > 

It is an aim of the embodiments of the present invention: to 
address or at least partially mitigate one or more of the 
problems discussed previously. 



Embodiments of the present invention may provide a single 
device capable of being flexible enough to decode a range of 
25 standards . 



30 



There is provided according to the invention a video decoding 
circuit comprising: a first video data processor; a second 
Video data processor; and a connection connecting said first 
Video data processor and said second data processor; wherein 
saxd first video data processor is arranged to receive a first 
signal comprising encoded video data, process said first 
signal to provide a second signal and output said second 
signal, said first video data processor being arranged to 
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process said first signal dependent on at least part of said 
received first signal, and said second video data processor is 
arranged to receive at least a part of said second signal, 
process said at least a part of said second signal to provide 
a third signal, and output said third signal, said second and 
third signals comprising a decoded video image stream, and 
said second video data processor is arranged to process said 
at least part of said second signal dependent on at least part 
of said at least part of second signal. 

The first video data processor may be arranged to variable 
length decode said received first signal to produce a decoded 
first signal. 

15 The first video data processor may be arranged to separate 
said first signal data into at least a first part and a second 
part, wherein said first part may comprise at least one of: 
pixel data; residual data, and wherein said second part may 
comprise motion vector data. 
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The first video data processor may be arranged to inverse 
guantize said first part of said first signal. 

The first video data processor may be arranged to 
domain transform said first part of said first signal. 




The first video data processor may be arranged to combine said 
spatial domain- transformed and/or inverse- quantized first part 
of., said first signal with said- second part- of said first 
30 signals 

The second, video ..data- processor may be arranged to interpolate. 



5 

The second video data processor may be arranged to interpolate 
at least a first part of said second signal using one oi 
horizontal and vertical interpolation. 

5 The circuit may further comprise a memory, said second videc 
data processor may be arranged to store said interpolated part 
of said second signal in said memory. 

The second video data processor may be arranged to interpolate 
10 said stored interpolated first part of said second signal 
using the other one of horizontal and vertical interpolation. 

The second video data processor may be arranged to combine 
said interpolated part of said second signal and a furthei 
15 part of said second signal, wherein said interpolated part oi 
said second signal may comprise an estimated macro block, anc 
said further part of said second signal may comprise residual 
error data. 

* 

■ ■ 

20 The second video data processor may be arranged to filter at 
least one of said at least one part of said second signal anc 
said third signal. 

The filter may comprise at least one of a de-ringing filter 
25 and a de-blocking filter. 

The connection may comprises a bus connecting said first anc 
second video data processors. 



The circuit may further comprise a memory device, said memory 
device may possibly be connected to said bus. 



The first video data processor may have an output for 
outputting said second signal to said memory device via said 



bus . 



The second video data processor may have an input for 
receiving said parts of said second signal from said memory 
device via said bus- 

The connection may comprise a data interconnect, said data 
interconnect possibly directly connecting said first video 
data processor and said second video data processor. 

The first video data processor may have an output for 
outputting said second signal to said data interconnect. 

The second video data processor may have an input for 
receiving said parts of said second signal from said data 
interconnect . 

The second video data processor may receive part of said parts 
of said second signal from said data interconnect and part of 
said parts of said second signal from said bus. 

The first signal may be at least one of: a MPEG2 encoded video 
stream; a H.263 encoded video stream; a RealVideo9 encoded 
video stream; a Windows media player encoded video stream; a 
H.264 encoded video stream. 

The. second signal, may. comprise at least- one of: buffer base 
address word; picture, level parameter header word-; picture 
level parameter word; macro-block header word; slice parameter, 
word; mor±on vector., horizontal lumar. word; motions vector 
„-4„.i i,™-- word: mo_tion_ vector horizontal, chromsv word; 



motion vector vertical chroma word; pixel data reference word 
and pixel data residual word. 

The first video data processor may comprise a data packer. 

The second video data processor may comprise a data packer. 

The data packer may comprise; an input, said input may be 
arranged to receive said second signal, said second signal may 
comprise data words; means for ordering said data words; and 
an output, said output may be arranged to transmit data 
packets comprising ordered data words. 

An integrated circuit may comprise a circuit as detailed 
1 5 above . 

The first video data processor may comprise a very long 
instruction word processor. 
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The very long instruction word processor may be adapted to 
process said first signal further dependent on a set of 
instructions stored in a memory. 

The second video processor may comprise a programmable 
£j processor. 

According to a second aspect of the invention there is 
provided a video decoding method comprising the steps of 
receiving at a first video data processor a first signal 
30 comprising encoded video data, processing said first signal to 
provide a second signal dependent on at least part of said 
first signal, outputting said second signal, receiving at 
least a part of said second signal at a second video data 
processor, processing said at least part of said second signal 
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to provide a third signal dependent on at least part of said 
second signal, and outputting said third signal, wherein said 
second and third signals comprise a decoded video image 



stream 



The step of processing said first signal may comprise the step 
of variable length decoding said first signal. 

The step of processing said first signal may comprise the step 
of separating said first signal into at least a first part and 
a second part, wherein said first part comprises at least one 
of: pixel data; residual data, and wherein said second part 
comprises motion vector data. 

The step of processing said first signal further may comprise 
the step of inverse quantizing said first part of said first 
signal . 

The step of processing said first signal further may comprise 
the step of spatial domain transforming said first part of 
said first signal. 



The step of processing said first signal may further comprise 
the step of combining said spatial domain transformed and/or 
25 inverse quantized first part of said first signal with said 
second part of said first signal. 

The step- of processing at least a part of said second signal 
may further comprise, the step of interpolating at least a 
30 f j rs.tr part of said second, signi 



The ster^of-inESxpola±±ng at. least- a first- part of said- second, 
signal may comprise the, step of. inrexpo l a^tng at -least a fxrsrr 
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part of said second signal using one of horizontal and 
vertical interpolation . 

The step of interpolating may further comprise storing said 
5 interpolated part of said second signal* 

* 

The step of interpolating further may comprise interpolating 
said interpolated part of said second signal using the other 
one of horizontal and vertical interpolation. 

10 

The step of processing at least part of said second signal may 
further comprise combining said interpolated part of said 
second signal and a further part of said second signal , 
wherein said interpolated part of said second signal . may 
15 comprise an estimated macro block, and said further part of 
said second signal may comprise residual error data. 

> 

The step of processing at least a part of said second signal 
may further comprise the step of filtering, wherein said step 
20 of filtering may comprise at least one of the steps r de- 
ringing filtering and de-blocking filtering. 

The step of outputting said second signal may further comprise 
the step of storing said second signal in a memory. 

25 

The step of receiving at least part of said second signal may 
comprise receiving said at least part of said second signal 
directly from the first video data processor. 

i 

30 The step of receiving at least part of said second signal may 
comprise receiving a first part of said at least part of said 
second signal directly from said first video data processor 
and a second part of said at least part of said second signal 
from said memory. * 
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The step of processing said first signal may further comprise 
the step of packetizing said second signal. 

5 The step of processing said second signal may further comprise 
the steps of: packetizing said at least part of said second 
signal; storing said at least part of said second signal in a 
memory; and receiving said at least part of said stored second 
signal from said memory. 

10 

Brief Description of Drawings 

For a better understanding of the present invention and how 
the same may be carried into effect , reference may now be made 
15 by way of example only to the accompanying drawings in which: 

Figure 1 shows a pair of frames from a sequence of frames 
forming a moving image; 

Figure 2 shows a schematic view of a MPEG type encoder; 
20 . Figure 3 shows a schematic view of a MPEG type decoder in 

which embodiments of the present invention can be implemented; 

Figure 4 a shows a schematic view of an embodiment of a 
video decoding system; 

Figure 4b shows a schematic view of further embodiments 
25 of a video decoding system; 

Figure 5a shows a schematic view of a first arrangement 
for a decoding video processor as shown in figures 4a and 4b; 

Figure 5h_„ shows a schematic view of a second arrangement 
for a decoding video, processor as shown in figures 4a ancL 4b; 
30 Figure 5c shoi*s a schematic view of a third arrangement 

for a deco din g video processor as. shown in figures 4a _and 4b; 

Figure.. 5d_ shows a schematic vier of a fourth-,. a r ra ggemerrt 
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Figure 6a shows a schematic view of a first arrangement 
for a video co-processor as shown in figures 4a and 4b; 

Figure 6b shows a schematic view of a second arrangement 
for a video co-processor as shown in figures 4a and 4b; 

Figure 6c shows a schematic view of a third arrangement 
for a video co-processor as shown in figures 4a and 4b; 

Figure 7 shows a schematic view of a predictor 
constructor as shown in figures 6a, 6b and 6c; 

Figure 8 shows a schematic view of a data packer as shown 
in figures 5b, 5d and 6c; 

Figure 9 shows decoded data words passed to the packer 
unit as shown in figure 8; 

Figure 10 shows a block diagram detailing the processes 
of the data packer as shown in figure 8; 

Figure 11 shows a data structure for image data output 
from the data packer as shown in figure 8; 

Figure 12 shows a schematic view of the interpolation 
engine as used in the predictor interpolator as shown in 
figure 7; 

■ 

> Figure 13 shows a timing diagram explaining, the 

'synchronous' and 'asynchronous' connection embodiments 
between the video decoding processor and the video co- 
processor as shown in figure 4b. 

Detailed Description of Embodiments of the Present Invention 

Reference is made to figure 1, which shows a pair of frame 
Pictures demonstrating an MPEG type encoding system. As 
detailed earlier the generic MPEG type compression system 
involves sampling video as a series of pictures 101,- 103. 
These frame pictures may in embodiments of the invention be 
further divided into field pictures 150, 152. m the 
embodiment a frame picture has two field pictures, each field 
picture comprising alternate lines of the frame picture. 
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The picture is further divided into a series of macro-blocks 
(MB), of which one macro block 105 is shown. The macro-block 
contains sampled luma (brightness) and croma (colour) data. 
Typically the chroma data is sub sampled. Thus in an 
embodiment where the macro block is 16 by 16 pixels large, 
there are 256 luma samples and a smaller number of chroma 
samples. The macro blocks are further decomposed into smaller 
blocks, the number and structure of which is determined by the 
type of codec used. For example in MPEG-2 encoding each macro- 
block is subdivided into six blocks. Four blocks each of which 
is 8 by 8 pixels large contain the luma data. Two further 
blocks also 8 by 8 pixels large contain the sub-sampled chroma 
data. Other coding standards may for instance use blocks 8 by 
4 pixels, or 4 by 8 pixels large. Therefore in a full screen 
image picture of 720x480 pixels (NTSC National Television 
Systems Committee) or 720x576 (PAL Phase Alternating Line) 
pixels, there are many macro blocks to be encoded in every 
pictures . 

With reference to Figure 2, the steps of a MPEG type 
compression system will be ■ further detailed in order to 
understand the invention better. The MPEG type coder comprises 
the items: block encoder 201; frequency domain encoder 203; 
quantizer 205; and variable length encoder 207. 

The flow of data is such that the image data is passed 
initially to the block encoder 201. After processing by the 
block encoder 201, data is passed to the frequency domain 
encoder 203. After, frequency coding 203 the. data is. then 
passed to the_ quantizer 205. After quantization the data is 
then pa sses! to the variable . length" encoder. 207 . The ..output of 

=ki ^ i^nrrH-. Pn^od^ 207 is_ th.s_ MEEG type " v±d=o. ima^a 
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I pictures 101 and P, B pictures 103 follow slightly different 
processes in their encoding methods. 

The block encoder 201, in encoding an I picture, samples each 
macro block (MB) for luma and chroma values as is known in the 
art. This sampled data is passed to the frequency domain 
encoder 203. A copy of the sampled I picture data is also 
stored for P or B picture encoding. 

The block encoder 201, in processing a P or B picture uses the 
stored I or P picture data. The series of macro-block samples 
of the P or B pictures is searched for against the I or P 
picture macro-blocks. If a macro-block in the I or P picture 
is similar to the macro-block in the P or B picture, a 'match' 
is found. The matched macro-block is encoded using two pieces, 
of information. 

The first piece of information is known as the residual data 
The residual data is the difference in luma and chroma between 
the two macro-blocks (107, 105), that is the macro-block in 
the P or B picture and the matching macro-block in the I or P 
picture. 

The second piece of information created is the block movement 
predictor, otherwise known as the motion vector. The motion 
vector 109 as illustrated schematically in Figure 1 is the 
spatial difference between the matched macro-blocks. To 
illustrate the motion vector 109, the macro-block 107 of the P 
or B picture is mapped onto the I or P picture. The mapped 
macro-block is referenced by 111. The motion vector 109 shows 
the difference in position between the mapped macro-block 111 
and the I or P macro-block 105. This is purely for 
illustrative purposes. 
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The I picture data is passed to the frequency domain encoder 
203. The B or P residual data is also passed to the frequency 
domain encoder 203, whereas the motion vector data is passed 
5 directly to the variable length encoder 207. 

The frequency domain encoder 203 applies a transform 
converting the data from each block in the macro-block from 
the spatial domain to the frequency domain. As is known in the 
10 art such transforms may be discrete cosine transforms (DCT) , 
but may also be other known frequency domain transforms. The 
frequency domain output from the frequency domain encoder 203 
is then passed to the quantizer 205. 
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The quantizer 205 performs a re-sampling of the frequency 
domain information of block data from the macro-block. This is 
achieved by dividing the result of the frequency domain 
encoder by a predetermined number to remove some of the least 
significant bit values. 

The quantized frequency domain data stream is then passed to 
the variable length encoder 207, whereby the predicted macro- 
block data is recombined with the motion vector data. 

25 The variable length encoder 207 applies a process which 
removes redundancy from the bit stream by detecting sequences 
of 0's or l's and encoding them into a more efficient form. 
Variable length coding sequences such as Hoffman or Huffman 
coding- sequences along with others known in the art may be 

30 used. 

With reference to figure.. 3, the reverse'- of_ the en< 
process.., that, of decoding, a MPEG" type video.. stxsanc'.is_ shown. 



15 



The MPEG type decoder comprises a data input 300, a variable 
length decoder 301, an inverse quantizer 303, a spatial domain 
encoder 305, an adder 307, a macro-block reference image 
selector 309, an in-loop filter 351, a post processing filter 
5 353, and an image output 311. 

The decoder also comprises a series of interconnects 313, 315, 
317, 319, 321, 323, 325 and 327 which connect together the 
various components of the decoder. 
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30 



The interconnect 313 connects the variable length decoder 301 
to the inverse quantizer 303. The interconnect 319 connects 
the inverse quantizer 303 to the spatial domain encoder 305. 
The interconnect 321 connects the spatial domain encoder 305 
to the adder 307. The interconnect 315 connects the variable 
length decoder to the macro-block reference image selector 
309. The interconnect 317 connects the macro-block reference 
image selector 309 to the adder 307. The interconnect .325 
connects the output of the adder 307 to the in-loop filter 
351. The interconnect 323. connects the output of the in-loop 
filter to the macro-block selector 309. The interconnect 327 
connects the output of the in-loop filter to the input of the 
post-processing filter 353. 

The MPEG type compressed signal is input 300 to the variable 
length decoder 301. 

The variable length decoder (VXD) performs two functions on 
the initial data stream. The first function is applying an 
inverse of the variable length encoding algorithm to the data 
stream. 



16 



The variable length decoder 301 also determines the picture 
type, in other words whether the signal is a I, P or B 
picture. 

If the current picture is a I picture all of the data is 
passed via interconnect 313 to the inverse quantizer 303. If 
the current picture is a P or B picture the residual data is 
passed via interconnect 313 to the inverse quantizer 303, and 
the motion vector and the prediction mode for the macro-block 
are passed to the macro-block selector 309 via the 
interconnect 315. 

The inverse quantizer 303 performs the inverse procedure to 
the quantization unit in other words the data stream is 
returned to its original bit length by multiplying the 
received value by the predetermined number used. The values 
lost from the least significant bits during the quantization 
process though can not be restored. This data is passed via 
the interconnect 319 to the spatial domain transformer 305. ' 

The spatial domain encoder 305 performs a frequency to spatial 
domain transform on the data, thus bringing the data back into 
the spatial domain as originally encoded. Known spatial domain 
transforms include the inverse discrete cosine transform. 

The output from the spatial domain encoder 305 is the macro 
block luma and chroma information as originally encoded, but 
with, some data loss dependent- on the value of the quantization, 
originally performed. For the intra. (I picture) values the 
data_ is the_.absolute luma. and chroma values, and for the P or B 
picture values, the. data: is the relative luma_ and chroma:. 
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These values are output to a first input of the adder 307 via 
the interconnect 321. 

In the parallel data flow path, the predicted motion vector 
5 information is passed to the macro-block selector 309. This 
data is used to identify a matched element stored in a 
previously decoded I or P picture. This matched element is 
then passed via interconnect 317 to the second input of the 
adder 307 . 

10 

The adder 307 for an I picture macro-block outputs only the 
data passed to it from the spatial domain transformer 305. The 
adder 307 for a P or B picture outputs the combined values of 
the selected predicted macro-block element and the residual 
15 data output by the spatial domain transformer 305. 

Two types of filters are shown in the decoding embodiment 
shown in figure 3. The first type is the in-loop filter -351. 

r 

The in-loop filter is defined by the type of MPEG stream. 
20 Thus, for example, some standards H.263, WMV, Real Video and 
H.264 have de-blocking filters implemented as part of the 
decoding. 

In some standards there is no requirement for in-loop 
25 filtering in which case the in-loop filter functions as a data 
buffer. 

The second type of filter is the post processing filter 353. 
The post processing filter 353 are not defined by any one 
30 standard and can perform a range of filtering functions to 
improve the picture quality. These filters may be de-blocking, 
de-ringing or other image enhancement or noise reduction 
filters. Furthermore these filters where used are applied to 
data outside of the picture storage loop. 
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A de-blocking filter as known in the art can be used both as 
an in-loop and post processing filter. The filter works in the 
spatial domain- The filter examines the borders between macro- 
5 blocks to determine if the image quality between macro-blocks 
is significantly different. Differences in quantization 
between macro-blocks can produce boundaries between blocks 
whereby the same feature or line is not contiguous between 
macro-blocks. The de-blocking filter attempts to improve the 

10 image by examining if the non-contiguous feature has a low or 
high frequency component. Where the feature is slow moving, 
and therefore low frequency, it is possible to smooth the 
difference between the lines and produce a less visible 
artefact between macro-blocks. A rapidly moving feature though 

15 is not smoothed, as this would produce blurring of the 
feature. 

A de-ringing filter is a further adaptive filter which 
attempts to reduce the effect of harsh quantization. This type 

20 of filter is only used by the post processing filter. Harsh 
quantization of the macro-block removes large amounts of high 
frequency information. Visually this is seen in the appearance 
of rings of artefacts centred about a defined feature. This 
type of artefact is known in the art as mosquito noise. This 

25 type of noise can be mitigated by the application of a de- 
ringing filter which compares the current macro-block against 
previously calculated ones. It also examines the pixel values 
in. the current macr:o-block to attempt to mitigate the noise. 

30 Thus for botlL intra and predicted picture data a series of 
macro-b locks whictL when conrhiTr.ed wouid f oim a totaJL pictu re is 
output, f rem:: the_ adde,r . 3071 Intra' anct P pictuxiesL cair_th.ein.be 
passed, back. vxe..ths~ intsrcGnnsct 323 foir stoxarrer and.. used- for 
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As can be seen from the above discussion, there is a potential 
algorithmic division between data paths once the data has been 
variable length decoded. Motion vector information and 
5 residual data do not interact in the reconstruction of P and E 
pictures until they are received by the adder /combiner 307. It 
is therefore possible to separate the processing of these twc 
data paths without requiring data to be transferred between 
the two processing stages • 

10 

This separation of motion vector and residual decoding is more 
pronounced once the action of interpolation is introduced. As 
the motion vector may not point to an exact pixel value , but 
may point to a fractional position such as one half or a 

15 quarter of a pixel value away from a whole row or column 
value, the macro-block selector may be required to carry out 
an interpolation or re-sampling of image data. In order to 
carry out this interpolation process, I or P picture data is 
imported and the interpolator is required to carry out *i many 

20 numerical calculations in order to calculate each interpolated 
image. In tests the percentage of the time used to run on a 
general processor unit the motion compensation process part of 
the decoding of a MPEG data stream can be up to 40% of the 
whole run time. 

25 

Two types of video decoder are described with respect to the 
present invention. Both types are configurable video decoders 
and are therefore flexible enough to decode a range of 
different encoded video data streams. 
30 . 

In the first the separation of the two main processes is 
carried out by two separate data processors. In the second the 
two separate data processors are connected by an interconnect 
between the two. 
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With reference to figure 4a the first type of the flexible 
video decoder 4 99 is "shown. The first type represents the 
insertion of a video co-processor to allow flexibility in the 
5 flexible video decoder 4 99. Communication between the video 
decoder processor and the video co-processor is via a common 
data bus. 

The flexible video decoder comprises a video decoder processor 
10 (VDP) 509, a video coprocessor (VCP) 519, a shared memory 501, 
a shared data bus 503, and a series of interconnects 505, 507, 
517. 

The shared data bus 503 interconnects the shared memory 501 
15 via the interconnect 505 f the video decoder processor (VDP) 
509 via the interconnect 507, and the video co-processor (VCP) 
519 via the interconnect 517. 

All of the components described are created on a single 
20 integrated circuit. In some embodiments of the present 
invention however further components such as external 
input /output devices and data controllers can be connected to 
the flexible video decoder, or may be implemented as part of 
the flexible video decoder . 

25 

A flexible video decoder system receives the compressed video 
data on the shared data bus 503. The data can be received by 
the flexible ...video decoders 4 99 either, from an external source 
via an input / output ; device not shown or from the shared memory^ 
30 501. 

* 

This., datrai is_ pao3=d_to _ the video decoder, pracssson (VDF) 509 . 
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With reference to figures 5a and 6a a video decoding processor 
and video co-processor embodying the invention is shown. 
Figure 5a shows in further detail a schematic view of the 
video decoding processor VDP as shown in figure 4a. The video 
5 decoding processor 509 comprises a memory management unit 403, 
a variable length decoder 4 07, an inverse quantizer 411, a 
spatial domain encoder 415, an internal data interconnect 417, 
an external data interconnect 401, and interconnects 405v 409, 
413. 

10 

It is understood that in some embodiments of the present 
invention the video decoding processor is implemented on a 
Very Long Instruction Word (VLIW) processor. In such an 
implementation the data interconnects described are controlled 
15 and implemented in software. 

The external data bus 401 connects the external interconnect 
507 to the memory management unit 403. The memory management 

■ 

unit is further connected via interconnect 405 to the variable 
20 length decoder 407. The variable length decoder 407 is 
connected via interconnect 409 to the inverse quantizer 411. 
The inverse quantizer 411 is connected via- the interconnect 
413 to the spatial domain encoder 415. The internal data 
interconnect 417 connects the outputs of the variable length 
25 decoder and the spatial domain encoder to the memory 
management unit 403. 

The memory management unit 403 comprises bus routing and 
addressing systems, and memory cache in order to buffer data 
30 and access the shared data bus 503. Thus data received by the 
memory management unit 403 is initially buffered prior to 
being processed. 
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in embodiments of the present invention whereby the video 
decoding processor is implemented on a VLIW processor the 
memory management unit is that belonging to the VLIW processor 
and contains instruction and data caches. 

The memory management unit 403 further is arranged to detect 
the video data received by the video decoding processor 509 in 
order to determine the format of the video data and configures 
the VDP dependent on the format of the video data. 

in further embodiments of the present invention whereby the 
video decoding processor is implemented on a VLIW processor 
the detection of the received compressed video data is carried 
out by a software process executing on the VLIW processor. The 
VLIW processor is then arranged to send commands to the video 
co-processor to configure the co-processor. 

The received data is then transmitted to the variable length 
decoder 407 for variable length decoding. The variable length 
decoder 407 is arranged to decode the encoded data stream in a 
meaner as explained previously. 

The variable length decoder further is arranged to segment the 
data dependent on the data type and coding format. If the 
decoded data is part of an intra picture the data is passed 
via the interconnect 409 to the inverse quantizer. If the 
decoded data is a P or B picture the data is segmented into 
its residual data and motion vector data components. The 
motion vector data component is output on the internal data, 
interconnect 417, whereas, the residuaL data is passed via the 
interconnect 409 to the. inverse quantiser 411^. 
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receives the residual data component , or decoded macro-block 
data and performs an inverse quantization dependent on the 
format of the encoded data. The data is then passed via the 
interconnect 413 to the spatial domain encoder 415. 

5 

The spatial domain encoder is arranged to carry out a spatial 
domain encoding, in other words a frequency to spatial domain 
transform. The spatial domain encoder thus performs performs a 
frequency to spatial domain transform dependent on . the format 
10 of the received data. The processed data is then output onto 
the internal data interconnect 417. 

Hence both the motion vector, and residual data for a P or B 
picture and the reference data for a I picture is passed via 

15 the internal data interconnect 417 to the memory management 

■ 

unit 403 r where the data is buffered prior to being 
transmitted via the interconnect 507 and the shared data bus 

■ 

503 to the shared memory 501. 

> 

20 If the data is an intra picture , the encoded data is stored 
in memory as one of the I pictures to be referred to by the 
other P and B pictures. Similarly if the data is a P picture, 
the encoded motion vector and residual data is stored in 
memory as a P picture and can be referred to by other B or P 

25 pictures. If however the data is part of a B picture in order 
to reconstruct the true image the macro-block and residual 
error need to be created from the motion vector and residual 
data values. 

30 The video co-processor (VCP) 519 is arranged to receive data 
and carry out image construction. Figure 6a shows a schematic 
view of an embodiment of the video coprocessor 519 as shown in 
figure 4a. 
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The video coprocessor comprises a memory management unit 453, 
a predictor constructor 459, an external data interconnect 451 
and an internal data interconnect 455. 

The external data interconnect 4 51 connects the interconnect 
517 to the memory management unit 453. The memory management 
unit is connected via the internal data interconnect to the 
predictor constructor 459. 

The memory management unit 453 is arranged to buffer, receive 
and transmit data via the interconnect 517 to and from the 
shared memory 501. This information is then placed on the 
internal data connect 455 for use by the predictor constructor 
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in one embodiment of the present invention the video co- 
processor 519 is implemented on a programmable processor. In 
such an embodiment the memory management unit is controlled by 
the programmable processor. 

The memory management unit 4 53 also is arranged to detect the 
format of the received video data. The memory management unit 
453 is also arranged to configure the predictor constructor 
459 dependent on the format of the received data. 

The predictor constructor 459 is arranged to receive data from 
the internal, data interconnect 455 and process the motion 
vector data dependent on the format of the data received. The 
predictor constructor can therefore request reference data and 
perform, interpolation on. the received reference, data in order 
to create a required- reference! macro-block. 



t 
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. The predictor constructor 459 further is arranged to combine 
the residual and reference data to produce a full predictor 
macro-block. 



0 
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With reference to figure 7 a schematic view of a predictor 
constructor is shown. The predictor constructor 459 comprises 
an unpacker 683, a predictor error storage 685, an 
adder/combiner 687, a predictor fetch 681, and a predictor 
interpolator 689. The predictor constructor 459 further 
comprises an I/O interconnect 671, a predictor interconnect 
673, and interconnects 672, 677, 675. 

in an embodiment of the present invention whereby the video 
co-processor 519 is implemented on a programmable processor, 
the processes described with reference to figure 7 may be 
implemented by a series of software instructions or a 
combination of a series of software instructions and hardware 
implementation. In such an implementation involving software 
processes the term interconnect incorporates the transfer of 
data between software processes and between registers and/or 
internal memory. 

The i/o interconnect 671 connects the video coprocessor 
internal data interconnect 455 to the adder/combiner 687, and 
to the predictor fetch 681, and to the unpacker 683. The 
predictor interconnect 673 connects the unpacker to the 
predictor interpolator 689, and to the predictor error storage 
685. The interconnect 672 connects the predictor fetch 687 to 
the predictor interpolator 689. The interconnect 677 connects 
the predictor interpolator 689 to the adder/combiner 687. The 
interconnect 675 connects the predictor error storage 685 to 
the adder/combiner 687. 
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The unpacker 683 is arranged to receive data from the memory 
management unit 453 via the internal data connect 455 and to 
separate the motion vector data from the residual error data. 
The residual data is passed via the interconnect 673 to the 
predictor error storage 685, whereas the motion vector 
predictor data is passed to the predictor interpolator 689 via 
the interconnect 673. 

The predictor interpolator 689 receives the predictor motion 
vector data and calculates the required reference macro-block 
data. The predictor interpolator 689 then requests this data 
via the interconnect 672 from the predictor fetch 681. 

The predictor fetch 681 requests the required I or P picture 
macro-block from the shared memory 501, and retrieves the I or 
P picture macro-block and passes it to the predictor 
interpolator 689. 

The predictor interpolator is arranged to perform an 
interpolation based on the configuration information dependent 
on the format of the video data. The processed video data is 
than passed via the interconnect 677 to the adder/combiner 
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25 With respect to figure 12 a schematic view of an embodiment of 
the interpolator is shown. The interpolator comprises a 
duplicator 1202, a horizontal interpolator 1203, a memory 
array, and a vertical interpolator. The interpolator further 
comprises, an input T201 and -air output 1209, and interconnects 

30 1204, 1209, 1211* 1213, and 1215. 

The input 1201 is_ connected, to. the. duplicator-. 12B2 via the 
interconnect 120.4 . The duplicator. 120 2 is. connected to the 
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horizontal interpolator 1203 is connected to the memory array 
1205 via interconnect 1211. The memory array is connected to 
the vertical interpolator 1207 via interconnect 1213. The 
vertical interpolator is connected to the output via the 
5 interconnect 1215. 

The input receives the pixel data which is required to be 
interpolated. This pixel information is input into the 
interpolator horizontal line by horizontal line, with the 
10 pixels being input from the leftmost to the rightmost in each 
line. The pixels are then passed to the duplicator. 

The duplicator is used where the reference macro-block being 
used is only partially available. For instance in some video 

15 encoding standards the reference macro-block closest to the 
current macro-block is not a complete macro-block as stored in 
the reference image. For example if a feature displayed in a 
corner of the picture is moved towards the centre of the image 
in a later or earlier picture, the closest reference image is 

20 the one containing the partial feature. In such a situation 
the complete macro-block is required to be reconstructed from 
the partially retrieved macro-block. This reconstruction is 
carried out by the duplicator, which repeats the last partial 
macro-block pixel value according to the video encoding 

25 format . 

Where the macro-block is completely retrieved from the shared 
memory, the duplicator passes the pixel data to the horizontal 
interpolator 1203 . 

30 

The horizontal interpolator receives the pixel data and 
performs the horizontal interpolation dependent on the degree 
of interpolation and the format of the video stream. For 
example where a half pixel interpolation is required, this can 
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be carried out using a sample delay and an adder. Pixel data 
is passed both to the sample delay and to one of the inputs of 
the adder. The second input of the adder is connected to the 
output of the sample delay. The output of the adder is 

5 therefore the sum of the current and previous pixel. A 
division by two can then be carried out to produce an average 
value - or the interpolated value between the two pixels. The 
horizontally interpolated data is then passed into a memory 
array so that the pixels form an array of lines of 

10 interpolated values. 

Once the memory array is full, in other words all of the 
pixels have been horizontally interpolated the memory array is 
read vertically to the vertical interpolator 1207. 
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The vertical interpolator is capable of carrying out 
interpolation in the vertical plane. The vertical 
interpolation can for example be a simple half pixel 
interpolation and comprise a similar apparatus as the 
horizontal interpolator - an adder, divider and a sample delay 
element to create and average value. 



The output of the vertical interpolator is then passed to the 
output 1209, which in embodiments of the present invention is 
25 output to the adder/combiner 687. 

in further embodiments of the present invention, the. 
horizontal. _and vertical interpolators can.be bypassed if. there 
is no interpolation- required in. either the horizontal- or 
30 vertical degx.ee. Further embodiments of the interpolator may 
only ha*e one division stage after the final interpolatrorr 

Furth3er_ embodiments:, of. the - pres=nt_. indention can 
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performing horizontal interpolation. Further embodiments of 
the present invention may only use one interpolation stage. In 
such an implementation the data is read into the interpolation 
stage in a first order in order to carry out the required 
5 horizontal or vertical interpolation. The interpolation stage 
outputs the data to the memory array. The data is then read 
out of the memory array and back into the same interpolation 
stage in a second different order to implement the other 
required interpolation. 

10 

The predictor error storage 685 is arranged to store the 
residual data otherwise known as predictor error data whilst 
the predictor interpolator performs interpolation on the 
motion vector data. The predictor error storage transmits the 
15 predictor error data to the interconnect 67 5. r . 

The adder/combiner 687 receives inputs from the interconnects 
677 and 675 comprising the interpolated predictor macro- 
blocks , and predictor error data respectively and combines 
20 these to produce completed video image data. The image data is 
output via the data interconnect 67 9 to the internal data 
interconnect. The data is passed to the memory management unit 
4 53 to be passed to the shared memory. 

r 

25 In a further embodiment of the present invention the video co- 
processor 519 further comprises an in-loop filter 4 61. The in- 
loop filter is connected via the internal data interconnect 
4 65 to the memory management unit 453 and predictor 
constructor 459. 

30 

The in-loop filter 461 is arranged to perform in-loop 
filtering, such as de-blocking filtering as explained earlier 
on the received video data. 
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The in-loop filter 4 61 is configured dependent on the video 
data format. The in-loop filter receives image data either 
from the memory management unit 453, or the predictor 
constructor 459* The in-loop filter is capable of processing 
5 both reference and non-reference image data. Where the video 
data format does not explicitly allow the use of in-loop 
filtering the image data sent to the in-loop filtering is 
stored both pre and post filtering so that pre-filtered data 
is used for future referencing but the post filtering data is 
10 used for the improved display image. 

With reference to figures 5b and 6a a schematic view of a 
second embodiment of the present invention is shown. 

15 The video decoding processor 509 comprises a memory management 
unit 4 03, a variable length decoder 407 , an inverse quantizer 
or 411, a spatial domain encoder 415, a data packer 419, an 
internal data interconnect 417, an external data interconnect 
401, and interconnects 405, 409, 413. 

20 

The memory management unit 403, the variable length decoder 
407, inverse quantizer 411 and spatial domain encoder 415 are 
connected and operate as described in relation to Figure 5a. 
The internal data interconnect 417 further connects the data 
25 packer 419 to the memory management unit 403, the variable 
length decoder 407 and the spatial domain encoder 415. 

The data packers 419 receives the data output by the variable 
length decoder 407 and spatial domain decoder 415 and 
30 processes, the data so to produce a standardised video data 
packet. The purpose- of such a packing . process- is. to reduce the. 
memory bandwidth., regret red, in t rsasfiairri ng data from the video 
de^pdinrr- processor'. As. is. disirus^Hrf- in. data i 1 lat^r. tfaeL__ 
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data is 9 bits long. It is therefore possible to pack 7 
residual picture data elements into a single 64 bit word* 
Furthermore in embodiments of the present invention where the 
video decoding processor 509 is implemented in a VLIW 
5 processor the data packer can be implemented by a software 
process . 

In order to understand the role of the data packer the data 
words produced by the VDP 509 and received by the data packer 
10 419 will be described. These are shown in figure 9* 

The first word produced by the VDP for each image is the 
buffer base address (BBA) word 901 as shown in figure 9a. The 
buffer base address word defines the base memory location 
15 within which to place the following image data. Embodiments of 
the present invention can use the full 32 bit word (which 

■ 

provides a maximum of 2 32 (4,294,967,296) addressable 
locations) or use a smaller number of addressable locations 
addressable using fewer bits. 

20 

Following the buffer base address word 901 the VDP 503 
produces a picture level parameter header word (PLPH) 903. The 
picture level parameter header 903 further comprises a coding 
standard nibble 905, picture level parameter bits 907, and a 
25 parameter word length byte 909. 

The coding standard nibble 905 indicates the current video 
standard being decoded. In one embodiment of the present 
invention the nibble 0000 b (where the symbol b indicates that 
30 the number referred to is a binary number) indicates a MPEG2 
stream, whereas a nibble 0001 b indicates a MPEG4 stream (a 
standard proposed by the Motion Picture Experts Group and 
hereby incorporated by reference, also including the QivX 
video encoding method. The nibble 0010b indicates a RealVideo9 
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signal, a proprietary encoding system proposed by Real 
Networks. The nibble 0011 b indicates a H.263 stream (H.263 is 
a International Telecommunication Union - Telecommunications 
ITU-T standard recommendation, which is hereby incorporated by 
5 reference) for video coding over low bit rate communication. 
The nibble 0100 b would indicate that the video stream was a 
Microsoft WindowsMedia Video 9 video stream, a proprietary 
standard created by Microsoft which is hereby incorporated by 
reference. A nibble of 1000b would indicate that the video 
10 stream was a H.2 64 standard stream (H.264 is a further video 
coding over communication systems recommendation by the ITU-T 
board which is hereby incorporated by reference) . 

The coding standard information is needed since the different 
15 MPEG type encoding systems have variations in the type of data 
contained in the macro-block data. Thus for this reason the 
DivX and MPEG4 types can be grouped together as the syntax of 
the macro-blocks for these two standards are the same. 

20 The picture level parameter bits 907 pre-indicate the presence 
of certain types of data in the picture level parameter words 
911. 

The picture level parameter word length byte 909 defines the 
25 number of parameter words which follow the picture level 
parameter header word 903. The maximum value that the picture 
level parameter word lengtn can have is 2 8 -l (255) . 

Thus following the picture level parameter, header word 903 is 
30 a number of_ picture level parameter words, one oil which, is. 
shown by example in figure 9(c) 911.. The picture level 
p argmstre rsi: define, the. vari ab les, pre-lndi cated by ttte. picture 
levels paranr=jL=r. bit s.. 907. Th«~se_ variables- der±n^_~sucii itjains as. 
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picture type, picture size, interpolation modes, filtering 
type and filter variables. 

Once the picture level parameters are set for a specific image 
5 the following data words passed by the VDU 509 to the date 
packer 419 define the many macro-blocks which go to make up 
the image. In an embodiment of the present invention the data 
is passed a macro-block at a time, and in the order describee 
below. 

10 

The first word defining each macro-block is the macro-block 
header first word 913. With reference to figure 9(d) the 
macro-block header first word is further detailed. 

■ 

15 The macro-block header first word 913 further comprises a 

standard specific parameter word 915, an E (End) flag 917, a I 

(new sLice) flag 919, a S (Skipped macro-block) flagf;. 921, an I 

■ ■ 

(Intra macro-block) flag 923, and a motion vector length word 
925. 

20 

The standard specific parameter word 915, together with the 
whole of the macro-block header second word 927 (the macro- 
block second header word following directly after the macro- 
block first header word and as shown with reference, to figure 
25 9(e)) comprise information relating to the standard specific 
macro-block prediction mode. 

In other words both the standard specific parameter word 915 
and the macro-block header second word 927 comprise such 
30 parameters as the type of macro-block being decoded, the 
number and direction of the reference images to use in order 
to reconstruct the current frame/field. 
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For example the table below shows the various predictor code 
values representing the various macro-block types in a MPEG2 
encoded stream. These values may differ in meaning for 
different encoding methods. 

5 



Macro— clock lype 


Predictor code value 


0 


1 - Intra 


1 


2 - Predicted forward 


2 


3 - Predicted backward 


3 


4 — Predicted interlaced forward 


4 


5 - Predicted interlaced backward 


5 


6 — Predicted bi-directional 


6 


7 - Predicted interlaced bi- 
directional 



Thus for the first type of macro-block the macro-block being 
processed is a intra ' block. The second and third types of 
macro-block define frame based pictures which are referenced 

10 to frames stored either prior or after the currently processed 
frame. The fourth and fifth types of macro-block define field 
based pictures which are referenced to fields stored either 
prior or after the currently based fields. The sixth type of 
macro-block defines a frame based picture that is referenced 

15 to frames both prior and after the currently processed frame. 
Finally the seventh image defines a field based picture that 
is referenced to fields both prior and after the currently 
processed fields. 

20 TheL. four flag bits, I fiag 923, S flag 92L, L flag 919, and E 
fla^_ 917 are inserted into the macro-block header: word in 
ordejc to enable, the. packezr to perform, the. packing, process, more 
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The I flag 923 indicates whether the current macro-block is 
part of an intra macro-block. 

The S flag 921 indicates whether or not this macro-block was 
5 skipped during coding. 

The S flag 921 is only asserted when the I flag 923 is not 
asserted, in other words the two flags are mutually exclusive 
as an intra macro-block can not be skipped. 
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20 



25 



30 



The L flag 919 indicates whether this is a first macro-block 
of a slice. 

The E flag 917 indicates whether this macro-block is the last 
macro-block for the current picture. 

The motion vector length word 925 defines the number of motion 
vectors within the current macro block. The number of motion 
vectors required to produce a finished macro-block is equal to 
(the value of the motion vector length word)+l. Thus a motion 
vector length word of 00000 b would indicate that there was 1 
motron vector to process, and a motion vector length word of 
11111b would indicate that there were 32 motion vector to 
process. 

* 

The situation of there being 0 motion vectors in the macro- 
block is provided for by the assertion of the I flag 923 
indicating that the current macro-block is intra-coded and 
therefore without motion vectors. 

The number of motion vector words in P or B picture macro- 
blocks is dependent on the type of encoding used. 
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For example using MPEG2 encoding as an example the following 
table shows the number of motion of vectors required. 





xype uiscriprxon 


No or Motion vectors 
two or ruv words; 

• 


0 


Intra 


0 


1 

J. 


rreuicueu jtor 


1 i A \ 
1 \H) 


2 


Predicted back 


1 (4) 


3 


Predicted interlaced for 


2 (8) 


4 


Predicted interlaced back 

> 


2 (8) 


5 


Predicted bi-dir 


2 (8) 


6 

• 


Predicted interlaced bi- 
dir 


4 (16) 



5 If the L flag 919 is asserted the words sent following the 
macro-block header words 913 and 927 are the slice parameter 
words, one of which is displayed with reference to figure 9(f) 
929. If the L macro-block flag 1027 is not asserted no slice 
parameter words are sent and the packer instead outputs the 
10 next type of data word. 

Thus in other words if the video stream is not a H.264 encoded 
stream and the macro-block being decoded is not the first 
macro-block of the stream there are no slice parameter words 
15 929. 

The slice parameter words define variables used in the 
processing of H.26.4 encoded streams.. 

20 If_ the current, macro-block is a intra macro-block, as 
indicated by the I flag 923 in. the macro-block header first 
word. 913 . belngi aaaertsd no motrLon vec tor ixiformactlon is_. 
re:Cjni rsj to re C£?H3 iz nicir the marnno— block . and. ttis_ pad~P v passeB:. 
71 *3* tt *^'3 n 3zzrr. s *s r uSjs or ".c a . *~ *r 3 *v "~** • • — ** ~ ~ *^ ^ 3 e *^t* 2 s zi 2 - i *~ 
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the current macro-block is not a intra macro -block the VDP 509 
passes to the data packer 513 a series of words defining the 
motion vector information. As discussed previously the 
composition and the number of motion vector words per macro- 
block is dependent on the type and variant of the coding 
scheme used. 

In order to locate a macro-block on a horizontal and vertical 
axis, each motion vector describes a horizontal and vertical 
component. Furthermore in all of the coding methods the luma 
and chroma motion vectors are sent separately. Thus the 
smallest number of motion vector words per motion vector and 
therefore per macro-block (for a non-reference frame/field) is 
4 - horizontal luma, vertical luma, horizontal chroma, 
vertical chroma. 

• - 

A quartet of motion vector words representing a single motion 
vector as used in a embodiment of the present invention; is 
shown in figures 9(g)-(j). These four motion vector words 
represent the basic four motion vector words used in, for 
example, a MPEG2 encoded data stream where a predicted forward 
or predicted back type macro-block is being decoded.. 

In the current example the order that the VDP passes motion 
vector words is in the order horizontal luma word, vertical 
luma word, horizontal chroma word, vertical chroma word. In 
other embodiments other combinations are possible and are 
discussed below. 

With reference to figure 9(g) the motion vector horizontal 
luma word 931 is shown. The motion vector horizontal luma word 
931 comprises a reference picture index value 933 and a 
horizontal luma component 935. 
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The reference picture index value 933 is a 4 bit number which 
indicates which of the maximum of 16 stored pictures form the 
basis image. 

The horizontal luma component 935 is a 12 bit number which is 
capable of representing a offset in the range of -1024 to 
1023.5 in half pixel precision or -512 to 511.75 where the 
precision is one quarter pixel. The horizontal luma component 
therefore describes an horizontal offset of somewhere between 
these numbers from the current macro-block and the intra or P 
macro-block. 



With reference to figure 9(h) the motion vector vertical luma 
word 937 is shown. The motion vector vertical luma word 937 
15 comprises a field indicator bit 939 and a vertical luma 
component 941. 

The field indicator bit 939 indicates whether the current 
macro-block is the upper or lower field of the pair of fields. 
20 Where the current macro-block is frame based, the field 
indicator is undefined and may be a 0 or 1. Where the current 
macro-block is field based, a motion vector defining an upper 
field macro-block has a first value and the motion vector 
defining a lower field has the second and opposite value. 

25 

The vertical luma component 941 is a 12 bit number which is 
capable of representing an offset in the range of -1024 to 
1023.5 in half, pixel precision or -512 to 511.75 where the 
precision is one quarter pixel. The vertical., luma component- 
30 therefore describes a vertical- offset of somewhere between- 
these- numbers f ronr the current macro-block and the intra or P 
macro -block. 
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With reference to figure 9(i) the motion vector horizontal 
chroma word 943 is shown. The motion vector horizontal chroma 
word 943 comprises a horizontal chroma component 945. 

The horizontal chroma component 945 is a 12 bit number which 
is capable of representing an offset in the range of -1024 to 
1023.5 in half pixel precision or -512 to 511.75 where the 
precision is one quarter pixel. The horizontal chroma 
component therefore describes a horizontal chroma offset of 
somewhere between these numbers from the current macro-block 
and the intra or P macro-block. 

With reference to figure 9(j) the motion vector vertical 
chroma word 947 is shown. The motion vector vertical chroma 
word 947 comprises a vertical chroma component 947. 

The vertical chroma component 947 is a 12 bit number which is 
capable of representing an offset in the range of -1024 to 
1023.5 in half pixel precision or -512 to 511.75 where the 
precision is one quarter pixel. The vertical chroma component 
therefore describes a vertical chroma offset of somewhere 
between these numbers from the current macro-block and the 
intra or P macro-block. 

In further embodiments of the present invention the reference 
value and field indicator may be stored more than once per 
quartet of motion vector words. 

As previously described more than one quartet of vector words 
may be required per macro-block in order to define the 
estimated macro-block location for more than a single picture 
as in bi-directional encoding, and also for more than a single 
frame/field as in interlaced image encoding. 
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As also previously mentioned and determined by the macro-block 
header first word 913 and the motion vector length 925 up to 
32 motion vectors may be defined in a single macro-block. Thus 
for example in H.264 32 motion vectors may be required and 
therefore 128 motion vector words required for a single macro- 
block. 

Where there is more than one motion vector to be defined per 
macro block, embodiments of the present invention are arranged 
) such that the VDP 509 outputs all of the luma motion vector 
words before outputting all of the chroma motion vector words. 
Other embodiments of the present invention are arranged such 
that the VDP 50 9 outputs each motion vector quartet 
individually one at a time. 

■ 

If the macro-block being decoded by the VDP is an intra macro- 
block the next type of data word to be output by the VDP are 
pixel data reference words. With reference to figure 9(k) a 
pixel data reference word 951 is described in more detail. The 
10 pixel data reference word 951 comprises a pixel reference byte 
953. The pixel reference byte is the intra macro-block decoded 
pixel data as an 8 bit value. Thus the pixel has a value in 
the range from 0 to 255. 

25 If the macro-block being decoded by the VDP is part of a 
predicted macro-block the next type of data word to be output 
by the VDP are pixel data residual words. With reference to 
figure 9(1) a pixel data residual word 955 is described in 
more detail. The pixel data residual word 955 comprises a 

30 pixel residual_ value 95T. The pixel residual, value is the 
pixel residual value with reference to a known, reference 
value.. The. 9 bit: value gives, a relative shift- of- the range 
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Providing the current macro-block is not skipped, as indicated 
by the S flag 921 in the macro-block header first word 913, 
the VDP 509 outputs 384 words of either pixel data residual 
words 955 or pixel data reference words 951- This defines six 
8 by 8 blocks - 4 blocks defining luma values and 2 blocks 
defining chroma values. The pixel word order for each of these 
blocks is row by row, in other words a complete first row is 
output from left to right before starting again on a second 
row from left to right. 

In frame images in all standards except H.2 64 and in frame 
macro-blocks encoded in H.264 adaptive field/frame pictures, 
the row order is always sent using an interlaced pattern - in 
other words first the odd rows of pixel data are sent by the 
VDP and then the even rows of pixel data is sent. This 
ordering is carried out by the VDP prior to sending it to the 
data packer. 

In MPEG2 or H.264 field pictures and H.264 field macro-blocks 
in adaptive field/frame pictures each block contains data from 
one field only. 

With reference to figure 6a as implemented in the second 
embodiment the video coprocessor 519 comprises the same as the 
video coprocessor 519 implemented in the first embodiment of 
the present invention. The second embodiment predictor 
constructor 459 as shown in figure 7 is the same as that 
implemented in the first embodiment excepting that the 
unpacker 683 of the predictor constructor 459 is arranged to 
unpack the video data packets prior to segmenting the data 
into residual and motion vector components. 
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The second embodiment predictor interpolator can further 
comprise an interpolation engine as described previously and 
shown in figure 12. 

The first type of implementation is therefore shown by figure 
4a and detailed in the first two embodiments, the first shown 
by a combination of figures 5a and 6a and the second shown by 
the combination of figures 5b and 6a, which are further 
detailed by the figures 7 to 12. This first type represents a 
more flexible and cost effective solution than either the GPU 
or current video decoding systems. 

However, the reliance on passing all of the information from 
the VDP to the memory and then from the memory to the VCP can 
cause the VDP problems in some embodiments of the invention 
with respect to efficient memory bandwidth utilization. 
Although the addition of a data packer as detailed in the 
second embodiment may increase the efficient use of memory 
bandwidth to and from the VDP it may do so at the cost of 
inefficient use of the VDP in terms of processing video and 
possibly audio data. 

Therefore with reference to figure 4b a schematic view of a 
second type of flexible video decoder is shown. This 
embodiment attempts to improve on some of the problems of the 
previous embodiment . 

Where elements perform... substantially the same tasks, in the 
second type: and its embodiments as the first type ancL. its 
embodiments as shown in figure 4 a the reference figures are 
the same. 
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memory 501, a shared data bus 503, and a series of 
interconnects 505, 507, 517. The flexible video decoder 
further comprises a data interconnect 511. 

5 The shared data bus 503 interconnects the shared memory 501 
via the interconnect 505, the video decoder processor (VDP) 
509 via the interconnect 507, and the video co-processor (VCP) 
519 via the interconnect 517. The data interconnect 511 
connects the video decoding processor 509 to the video co- 
10 processor 519. 

Thus in this embodiment of the present invention there is a 
direct connection between the video decoding processor 509 and 
video coprocessor 519* 

15 

* 

This direct connection can be exploited by the use of a 
^synchronised' data connection or an ^asynchronised' "\ data 
connection. 

20 Figure 13 shows a schematic view of the differences between 
* synchronised' and ^asynchronised' data connections. The 
* synchronised' data connection as shown in figure 13a is one 
in which data processed by the video decoding processor is 
passed to the video coprocessor and is processed as soon as it 

25 is received. This can be seen in that the pictures n, n+1 and 
n+2 are received at the VDP at times ti, t 2 , and t 3 . Each 
picture is decoded 1351, 1355, 1359, and the decoded 
information passed to the VCP and the picture reconstructed as 
soon as possible. As the reconstruction of each picture takes 

30 more time than the decoding stage the VCP soon starts to hold 
up the processing in the VDP, shown as the stall time in the 
VDP 1367 which also causes the VCP to over-run in the 
reconstruction of each picture as shown by the areas 1363 and 
1365. Thus in other words the VDP and VCP have to be 
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synchronised in order that data passes smoothly through the 
VDP and VCP pair so that a new picture can be displayed within 
the picture refresh period. 

in the asynchronous' data connection as shown in figure 13b, 
although the data is passed via the direct data connection, 
the video coprocessor is only required to receive and store 
the data in the video decoding processor timeframe and can 
therefore process the data over a more flexible time period. 
As is shown in figure 13b, although there is an effective 
delay of one time period from the receiving the encoded data 
at the VDP and decoding the picture 1301, 1307, and 1315, and 
reconstructing the picture at the VCP, 1303, 1309m and 1313 

^ n . nfT nr the VDP as it waits for the VCP to 
there are no stalling o£ me vur 

clear its previous picture reconstruction. Furthermore as the 
VDP is not kept in a stalled condition it is capable of 
performing other tasks during the time periods where it is not 
decoding the received pictures, as shown by the periods 1305 
and 1311. These tasks include audio decoding or other data 
related imaging such as subtitle display. 

With reference to figure 5c and figure 6b schematic views of a 
third embodiment of the present invention are described which 
may exploit a 'synchronous' data connection. 

The VDP 509 as shown in figure 5c, comprises a memory 
management unit 403, a variable length decoder 407, an inverse 
quantizer 411, a spatial domain encoder 415, an internal data. 

_ A , n _._ ^x^srv^i data interconnect 4 01, and 
interconnect 417, an exue^— x uoa-o. 

30 interconnects 405, 409, 413.. 

The extern*! data, bus.. 401. .connects the external, interconnect 
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length decoder 407. The variable length decoder 407 is 
connected via interconnect 409 to the inverse quantizer 411 
The inverse quantizer 411 is connected via the interconnect 
.413 to the spatial domain encoder 415. 

The internal data interconnect 417 in the third embodiment 
differs from the internal data interconnects as described in 
the previous embodiments as the outputs of the variable length 
decoder and the spatial domain encoder are not connected to 
the memory management unit 403, but connects the outputs of 
the variable length decoder and the spatial domain encoder to 
the interconnect 511. 

Therefore the VDP functions in a similar manner to that 
described in the first two embodiments except the output from 
the variable length decoder 407 and spatial domain encoder 415 
are not passed back to the memory management unit 403 but are 
passed via the interconnect 511 to the video co-processor 519. 

The schematic view of the third embodiment VCP 519 is shown in 

figure 6b. The third embodiment VCP 519 comprises a memory 

management unit 453, a predictor constructor 459, an external 

data interconnect 451, the in-loop filter 619 and an internal 

data interconnect 455. It can be appreciated that further 

embodiments of the VCP 519 may not require an in-loop filter 
619. 



The external data interconnect 4 51 connects the interconnect 
517 to the memory management unit 453. The memory management 
unit is connected via the internal data interconnect to the 
predictor constructor 459 and the in-loop filter 619. The 
third embodiment as shown in figure 6b differs from the first 
and second embodiment VCP as the internal data interconnect 
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455 is connected to the data interconnect 511, thus connecting 
the VDP 509 to the components of the VCP 519. 

Aside from the manner in which the VCP as shown in figure 6b 
receives the data from the VDP as shown in figure 5c, the 
functionality of the VCP is substantially similar to that 
detailed in the VCP described in the first embodiment. 
Therefore the predictor constructor as described previously 
and as shown in figure 7 and the interpolation engine as shown 
in figure 12 are both applicable to the third embodiment. 

in such a 'synchronous' data connection the VDP 509 and VCP 
519 are connected together in a system known as a master-slave 
connection. Thus data passed from the VDP 509 is received by 
the VCP 519 and processed as soon as it was received. 



A fourth embodiment of the present invention is shown by the 
combination of the VDP as shown in figure 5d and the VCP as 
shown in figure 6b as incorporated into the implementation as 
20 shown in figure 4b. 

in this fourth embodiment the VDP processes the data passed 
from the variable length decoder 409 and spatial domain 
encoder 415 into data packet format in a similar manner to 
that detailed earlier in the second embodiment of the 
invention. A schematic view of such an embodiment of a VDP 509 
is shown in figure 5d. The VDP implemented in the fourth 
embodiment of the invention, differs from the VDP implemented 
in the third embodiment as a data packer 419, similar to that 
described earlier in the second embodiment is inserted between, 
the internal, data, interconnect 417 and . the data. 
511. Similarly the" VCF" predictor" constructor 4 59 comprises, an 
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processed data in a manner similar to that described earlier 
in the second embodiment. The fourth embodiment therefore has 
the advantage of a direct link between the VDP and VCP and 
also has efficiently packed data using the link. 

5 

The A synchronous' data connection embodiments, the third and 
fourth embodiments , retain the flexibility of decoding various 
types of video data but without the inefficiencies created by 
using the memory management unit 403 of the VDP 509 to both 

10 receive the raw encoded data and also transmit the partially 
processed video data. The * synchronous' data connection 
embodiments though by directly linking the VDP 509 and VCP 519 
may require a relatively strict timing synchronisation. In 
order that both the VDP and VCP process data efficiently both 

15 the VDP and VCP decode each picture at the same rate. If this 
synchronisation is not maintained either the VDP waits for the 
VCP to finish processing and catch up with the VDP, or the VCP 
waits for the VDP to. finish processing a macro-block or data 
packet. Internal buffering of the data can prevent some of the 

20 delays. 

If the VDP is required to do more than video decoding of the 
received data stream then the ^synchronous' data connection 
means that the VCP is not being efficiently used. The VCP 
25 during these times is left idle. 

The * asynchronous' data connection embodiments as shown by the 
fifth and sixth embodiments of the present invention are aimed 
to reduce to any potential inefficiencies and strict data 
30 synchronisation required by the 'synchronous' data connection, 
but without imposing further demands on the VDP 509. 

The fifth embodiment of the present invention is described 
with reference to figures 4b, 5c and 6c, which show schematic 
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views* of a VDP 509 and VCP 519 which may exploit an 
^asynchronous' data connection such as shown in the second 
type. 

5 The VDP as implemented in the fifth embodiment is similar to 
that implemented in the third embodiment of the present 
invention, wherein data is passed from the outputs of the 
variable length decoder 407. and spatial domain encoder 415 to 
the interconnect 511. The data passed is unpacked and is 
10 transmitted via the interconnect to the VCP. 

The VCP 519 as shown in the schematic view in figure 6c 
comprises a memory management unit 4 53, a predictor 
constructor 459, an external data interconnect 451, the in- 

15 loop filter 619 and an internal data interconnect 455. It can 
be appreciated that further embodiments of the VCP 519 may not 
require an in-loop filter 619. The VCP 519 as shown in the 
fifth embodiment of the invention differs from those described 
in all previous embodiment VCP implementations in that the VCP 

20 further comprises a data packer unit 457. 

The external data interconnect 451 connects the interconnect 
517 to the memory management unit 453. The memory management 
unit is connected via the internal data interconnect to the 
25 predictor constructor 459, the in-loop filter 619, and the 
data interconnect 511. The embodiment as shown in figure 6c 
further has the internal data interconnect 455 being connected 
to the data packer unit 457. 

30 With reference to figure. 8 a schematic view of_ a data packer 
45T and application of a data packer wili. be described i n . 
furt±p~r. detail . The data_parJzer_ comprise a a packet packer .301, 
a memory buffer" unit .S03, and.- in f ^o^jnE^ts 35X^.353 and_855. 
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The first interconnect 851 connects the data interconnect 511 
to the input of the packet packer 801. The second interconnect 
853 connects the output of the packet packer 801 to the input 
of the memory buffer unit 803. The third interconnect 855 
5 connects the output of the memory buffer unit 803 to the data 
interconnect 515. 

As it can be appreciated the functionality of the . packet 
packer 801 and memory buffer unit can be combined into a 
10 single unit. 

Data words as detailed above are received by the data packer 
457. These words are transferred to the packet packer 801 via 
the interconnect 851 . 

15 

The packet packer 801 examines each of the words and controls 
the memory buffer unit 803 dependent pn the examined word. The 
packet packer 801 and the memory buffer unit work together to 
convert the words created by elements of the VDP 50 9 into a 
20 series of self contained packets of data each packet 
containing the data required to restore a single picture 
(either a frame or a field) . 

With reference to figure 10 the process through which the data 
25 packer processes the word created by the elements of the VDP 
509 are further detailed. Odd numbered steps on the left side 
are those carried out by the packet packer 801, even numbered 
steps on the right hand side are those carried out by the 
memory buffer unit 803. 

30 

For each separate image an image loop is initiated. The first 
step 1001 carried out by packet packer (PP) 801 is to read the 
buffer base address word 901. This is passed to the memory 
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buffer unit (HBO) 803 and stored in the MBO in step 1002 for 



later use 
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The next step 1003 is for the packet pecker 801 to read xn the 
picture level parameter header word 911. This is used to set a 
remaining picture level parameter counter in the packet packer 
801 to read the following words as picture level parameter 
words. The picture level parameter word is also used to set up 
th. MEU in terms of memory allocation for the picture level 
parameter words and to any picture level coding standard 
variations in the macro-blocks . The picture level parameter 
header word is also passed to the MBO and stored xn step 1004. 

The next step 1005 is to read in the following words until the 
remaining picture level parameter word counter reaches zero. 
These words are passed to the MBO 803 and stored xn step 1006. 

The next step 1007 forms the start of a macro-block loop The 
ma cro-block loop is repeated for each macro-block untx th. 
lut macro-block for that image is received. The -""-block 
header words 913,927 are initially read. The packet packer 801 
copies the flag settings 917,919,921.923, standard specxfxc 
parameter word 915 and motion vector length 925 and dependent 
on these values allocates memory in the MBO 803 for the macro- 
block. Furthermore the macro-block header words are passed to 
the MBO and stored in the MBO in step 1008. 

The next step 1009 determines if the L (slice) flag 919 has 

=„h " L is asserted the packet packer 

been asserted and u Jk *■ 

, 801 reads in the slice parameter words _ These words are then 
passed to the MBO" 803 and stored in the MBO in. step 1010.. 

Tha. na=L step, 1QIX detezsxxr^s xr- tn* J. i*™ M 

, ss sgaggteri - andLifi.ths I flag." 923 1 is. not as3^=.d. u«. 
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packet packer 801 reads the motion vector words 
931, 937, 943, 947 ♦ The number of motion vector words to be read 
is determined by the value of the standard specific parameter 
word 915 and motion vector length value 925 gathered from the 
5 macro-block header words 913,927 and as discussed earlier. The 
motion vector words are passed to the MBU and stored in the 
MBU in step 1212. This step therefore occurs where the macro- 
block being decoded is part of a predicted macro-block. 

10 The next step 1013 determines if both the I flag 923 and the S 
(Skipped) flag 921 have not been asserted. If both I and S 
flag are not asserted the packet packer 801 reads the pixel 
data residual words 955. The number of words per macro-block 
is known in the art and discussed earlier. The pixel data 

15 residual words 955 are passed to the MBU ♦ and stored in the MBU 
in step 1214. This step therefore occurs where the macro-block, 
being decoded is both part of a predicted macro-block and also 
a non-skipped macro-block. 

» 

20 The next step 1015 examines if the I flag is asserted. If the 
I flag is asserted, in other words the macro-block is part of 
an intra macro-block, then the packet packer 801 reads pixel 
data reference words 951. The number of words per macro-block 
is also known in the art and discussed earlier. The pixel data 

25 reference words 951 are passed to the MBU and stored in the 
MBU in step 1216. 

The next step 1017 examines if the E (End) flag is asserted. 
If the E flag is not asserted then the packet packer ends this 
30 instance of the macro-block loop and returns the process to 
step 1007 and the start of the next macro-block loop, awaiting 
data from the next macro-block. 
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If the E flag is asserted then step 1019 is carried out. Step 
1019 ends this instance of the image loop and returns the 
process to step 1001 and the start of the next picture packet. 
At the same time the packet packer 801 instructs the MBU to 
5 carry out step .1020. This step causes the MBU to issue a 
memory interrupt request. Once the request has been granted 
the memory buffer unit passes the stored packet to the memory 
management unit 403 and awaiting the transfer to shared memory 
501 storing the packet in the location provided by the buffer 
10 base memory address 901. Finally the MBU restores itself to an 
initial condition awaiting data from the next image. 

With reference to figure 11 the data structure of the final 
packet of data is further detailed. The packet 1101 comprises 
15 the picture level parameter header word 903, a picture level 
parameter block 1103 (which comprises the picture level 
parameter words 911) , and a macro-block data block 1105. 

The macro-block data block comprises a series of macro-block 
20 data elements 1105a to 1105n each of which define the macro 
blocks MB(0) to MB(N-l) respectively. A macro-block data 
element is shown in further detail also in figure 11. 

A typical data element 1105n, comprises in order of storage, 
25 the macro-block header first word 913, the macro-block header 
second word 927, a slice parameter block 1111, a motion vector 
block 1113 and a pixel data block 1115. 

The slice parameter, block 1111, \*hen it exists due to step 
30 1010, comprises the slice parameter. words_ 929 stored., together 
in. tire same block. 

The. uio.tir»n vector. blcKik 1113* when It- e:ctstH~ due to step-. 1017:, 



53 



10 



15 



0 



m 



arrangement of these words as discussed previously is such 
that the luma motion vector words 931,937 occur in the first 
part of the motion vector block 1113, and the chroma motion 
vector words 943,947 occur in the second part of the motion 
3 vector block 1113. 

The pixel data block 1115, when it exists due to steps 1014 or 
1016 comprises either pixel data reference words 951 (where 
the macro-block is part of a intra macro-block) , or pixel data 
residual words 955 (where the macro-block . is a part of a 
predicted macro-block and the macro-block is not skipped, . 

The data packet is then output to the internal data 
xnt erconnect 455. 

Whereas in the 'synchronous- data eonneot embodiments, ' 
detailed by the third and fourth embodiments of the present 
invention, the video data is processed by the predictor 
constructor 459 substantially as soon at it is received (with 
the possibility of being temporarily buffered by the memory 
management unit 4 53 or in the unpacker 683), the 
•asynchronous' data connect embodiments pass the processed 
packets to the memory management unit 453 to be stored in the 
shared memory 501. 

■ 

in the fifth embodiment the memory management unit therefore 
receives packet data from the data packer 457 and transmits 
the data packets to the shared memory 501 to a memory location 
as indicated by the data packet. 

These data packets can then be read from the shared memory 501 

when the VCP 519 is ablp _„„4- 

13 able to restore the predicted macro- 
blocks. The restoration and subsequent filtering, where 
required, of pictures is carried out in a manner similar to 
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the handling of packetized data as detailed previously in the 
first embodiment of the present invention. 

Thus such an embodiment allows the VDP to perform a series of 
5 picture decodes, which can be stored in the shared memory 
whilst the VCP performs the predicted picture reconstruction 
and possible in-loop filtering on previously decoded and 
fetched pictures. Thus as the VCP 'catches up' with processing 
the decoded pictures the VDP can be configured to perform 
10 other processing tasks - such as audio decoding. 

A sixth embodiment of the present invention can further be 
described with respect to figure 4b, wherein the VDP 
implemented is shown schematically by figure 5c and the VCP 

15 implemented is shown schematically by figure 6b. In this sixth 
embodiment, the VCP 519 receives the data from the VDP 509 and 
transmits the data to the shared memory 501 without first 
packetizing it. Thus the VCP carries out the task of storing 
the VDP processed data but without packetizing it. The VCP 

20 also then requests the VDP processed data in the manner 
described earlier during the first embodiment of the present 
invention. 

Thus the advantages of the fifth and sixth embodiments are 
25 that the VDP and VCP are linked by a direct interconnect that 
is not accessible by other shared components and therefore 
data can_.be passed directly within affecting the performance 
of the shared , data transport system.. Although being directly 
connected they are not synchronised to process data at the 
30 rate that picture production is synchronised and. therefore the 
picture data is buffered by the. shared memory. This: allows 
both- the. VCP andi.VDP: some f l eslbd 1 i r y in processing ability; 
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pictures and then reconstruct the nroH^f^ ^ 

ouiUGt rne predicted pictures at a time 

convenient for the VCP. 

It should be appreciated that further embodiments of the 
invention may be provided which combine different aspects of 
the described embodiments of the invention. 



Embodiments of the nrpqsni- j.,.^. , 

present invention have been described in 

the context of MPEG. However it should be appreciated that 
10 embodiments of the invention can be used with any other type 
of video data* 
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Claims: 

1. A video decoding circuit comprising: 
a first video data processor; 

5 a second video data processor; and 

a connection connecting said first video data processor 
and said second data processor; 

wherein said first video data processor is arranged to 
receive a first signal comprising encoded video data, process 
10 said first signal to provide a second signal and output said 
second signal , said first video data processor being arranged 
to process said first signal dependent on at least part of 
said received first signal, and 

said second video data processor is arranged to receive 

■ » 

15 at least a part of said second signal, process said at least a 
part of said second signal to provide a third signal, and 
output said third signal, said second and third signals 
comprising a decoded video image stream, and 

said second video data processor is arranged to process 

20 said at least part of said second signal dependent on at least 
part of said at least part of second signal. 

2. A circuit as claimed in claim 1, wherein said first 
video data processor is arranged to variable length decode 

25 said received first signal to produce a decoded first signal. 

3. A circuit as claimed in claim 2, wherein said first video 
data processor is arranged to separate said first signal data 
into at least a first part and' a second part, 

30 wherein said first part comprises at least one of; 

pixel data; 
residual data, and 

wherein said second part comprises motion vector data. 
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4. A circuit as claimed in claim 3, wherein said first video 
data processor is arranged to inverse quantize said first part 
of said first signal* 

5 

5. A circuit as claimed in claim 3 or 4, wherein said first 
video data processor is arranged to spatial domain transform 
said first part of said first signal. 

10 6. A circuit as claimed in claims 4 or 5, wherein said first 
video data processor is arranged to combine said spatial 
domain transformed and/or inverse quantized first part of said 
first signal with said second part of said first signal. 

15 7. A circuit as claimed in any previous claim, wherein said 
second video data processor is arranged to interpolate at 
least a first part of said second signal. 

8. A circuit as claimed in claim 7, wherein said second 
20 video data processor is arranged to interpolate at least a 

first part of said second signal using one of horizontal and 
vertical interpolation . 

9. A circuit as claimed in claim 8, further comprising a 
25 memory, said second video data processor being arranged to 

store said interpolated part of said second signal in said 
memory. 

10. A circuit, as claimed in claim 8 or 9, wherein said second 
30 video data processor is. arranged to interpolate said stored 

interpolated f i rs± parrtr of .said second" signal, using the other 
one. of. borizont? T and: vex±t.r:a.1 .interpolation^ 
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11. A circuit as claimed in claims 7 to 10, wherein said 
second video data processor is arranged to combine said 
interpolated part of said second signal and a further part of 
said second signal, 

wherein said interpolated part of said second signal 
comprises an estimated macro block, and said further part of 
said second signal comprises residual error data. 

12. A circuit as claimed in any previous claim wherein said 

second video data processor is arranged to filter at least one 

of said at least one part of said second signal and said third 
signal . 

13. A circuit as claimed in claim 12 wherein said filter 
comprises at least one of a de-ringing filter and a' de- 
blocking filter. 

14. A circuit as claimed in any previous claim, wherein said 
connection comprises a bus connecting said first and second 
video data processors. 

■ 

15. A circuit as claimed in claim 14, further comprising a 
memory device, said memory device being connected to said bus. 

t 

16. A circuit as claimed in claim 15, wherein said 
video data processor has an output for outputting said second 
signal to said memory device via said bus. 

17. A circuit as claimed in claim 16, wherein said second 
video data processor has an input for receiving said parts of 
said second signal from said memory device via said bus. 

18. A circuit as claimed in claims 1 to 17, wherein said 
connection comprises a data interconnect, said data 
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interconnect directly connecting said first video data 
processor and said second video data processor. 

19. A circuit as claimed in claim 18, wherein said first 
video data processor has an output for outputting said second 
signal to said data interconnect. 

20. A circuit as claimed in claim 18 and 19, wherein said 
second video data processor has an input for receiving said 
parts of said second signal from said data interconnect. 

21. A circuit as claimed in claim 20 when appended to claim 
15, wherein said second video data processor receives part of 
said parts of said second signal from said data interconnect 
and part of said parts of said second signal from said bus. 

22. A circuit as claimed in any previous claim, wherein said 
first signal is at least one of: 

a MPEG2 encoded video stream; 

a H.263 encoded video stream; 

a RealVideo9 encoded video stream; 

a Windows media player encoded video stream; 

a H.264 encoded video stream. 

23. A circuit as claimed in any previous claim, wherein said 
second signal comprises at least one of: 

buffer base address word; 

picture level parameter header word; 

picture level parameter word; 

macro-block header word; 

slice parameter word;. 

mnl-rnn vector, nor ijiontasLL lumar_woxd; 

„ _4- i --. - — - . ■ i r-.-r- T7-, - ■ ',- T".~ P 1 . 1'lTn-: v7<-'j-'-L^ 
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motion vector vertical chroma word; 
pixel data reference word and 
pixel data residual word. 

5 24. A circuit as claimed in any previous claim, wherein said 
first video data processor comprises a data packer. 

25. A circuit as claimed in claims 1 to 23, wherein said 
second video data processor comprises a data packer. 

10 

26. A circuit as claimed in claims 24 or 25, wherein said 
data packer comprises: 

an input , said input being arranged to receive said 

i 

second signal, said second signal comprising data words; 
15 means for ordering said data words; and 

an output, said output being arranged to transmit data 
packets comprising ordered data words. 

27. An integrated circuit comprising a circuit as claimed in 
20 any previous claim. 

28. A circuit as claimed in claims 1 to 26, wherein said 
first video data processor comprises a very long instruction 
word processor. 

. 25 

29. A circuit as claimed in claim 28, wherein said very long 
instruction word processor is adapted to process said first 
signal further dependent on a set of instructions stored in a 
memory. 

30 

30. A circuit as claimed in claims 1 to 29, wherein said 
second video data processor comprises a programmable 
processor. 



10 
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31. A video decoding method comprising the steps of: 

receiving at a first video data processor a first signal 

comprising encoded video data, 

processing said first signal to provide a second signal 
dependent on at least part of said first signal, 
outputting said second signal, 

receiving at least a part of said second signal at a 
second video data processor, 

processing said at least part of said second signal to 
provide a third signal dependent on at least part of said 

second signal , and 

outputting said third signal, 

wherein said second and third signals comprise a decoded 
video image stream. 

32. A method as claimed in claim 31, wherein said step of 
processing said first signal comprises the step of variable 
length decoding said first signal. 

20 33. A method as claimed in claims 31 and 32, wherein said 
step of processing said first signal comprises the step of 
separating said first signal into at least a first part and a 
second part, 

wherein said first part comprises at least one of: 

25 pixel data; 

residual data, and 

wherein said second part comprises motion vector data. 



15 



30 



34. A method as claimed in claim 33, wherein said step of 
processing said, first signcO. further comprises, the step of 
inverse quantising - said, ficst part of said first signal. 



35. A method- as _ claimed, in ."claims- 33 and. 34, wherein said. 
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step of spatial domain transforming said first part of said 
first signal. 

36. A method as claimed in claims 34 or 35, wherein said step 
5 of processing said first signal further comprises the step of 
combining said spatial domain transformed and/or inverse 
quantized first part of said first signal with said second 
part of said first signal* 

10 37- A method as claimed in claims 31 to 36, wherein said step 
of processing at least a part of said second signal further 
comprises the step of interpolating at least a first part of 
said second signal. 

15 38. A method as claimed in claim 37, wherein said step of 
interpolating at least a first part of said second: signal 
comprises the step of interpolating at least a first part of 
said second signal using one of horizontal and vertical 
interpolation. 

20 

* 

39. A method as claimed in claim 38, wherein said step of 
interpolating further comprises storing said interpolated part 
of said second signal. 

25 40. A method as claimed in claims 38 and 39 r wherein said 
step of interpolating further comprises interpolating said 
interpolated part of said second signal using the other one of 
horizontal and vertical interpolation. 

30 41. A method as claimed in claims 37 to 40, wherein said step 
of processing at least part of said second signal further 
comprises combining said interpolated part of said second 
signal and a further part of said second signal, wherein said 
interpolated part of said second signal comprises an estimated 
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macro block, and said further part of said second signal 
comprises residual error data. 

42. A method as claimed in claims 31 to 41, wherein said step 
of processing at least a part of said second signal further 
comprises the step of filtering, wherein said step of 
filtering comprises at least one of the steps: 

de-ringing filtering and de-blocking filtering. 

43. A method as claimed in claims 31 to 42, wherein said step 
of outputting said second signal further comprises the step of 
storing said second signal in a memory. 

44. A method as claimed in claims 31 to 43, wherein said step 
of receiving at least part of said second signal comprises 
receiving said at least part of said second signal directly 
from the first video data processor. 

45. A method as claimed in claims 44 when appended to claim 
40, wherein said step of receiving at least part of said 
second signal comprises receiving a first part of said at 
least part of said second signal directly from said first 
video data processor and a second part of said at least part 
of said second signal from said memory. 

46. A method as claimed in claim 31 to 45, wherein said step 
of processing said first signal further comprises the step of 
packetizing said second signal. 

47. A method as claimed in claims. 31 to 45, wherein said step 
of processing said, second signal further comprises the steps 

pa Efcs±dsing-sa±d.at lea^ part of srid^-.econd_^ignai; 
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storing said at least part of said second signal in a 
memory; and 

receiving said at least part of said stored second signal 
from said memory. 



ABSTRACT 



A VIDEO " DECODING DEVICE 

5 A video decoding circuit comprising: a first video data 
processor; a second video data processor; and a connection 
connecting the first video data processor and the second data 
processor; wherein the first video data processor is arranged 
to receive a first signal comprising encoded video data, 

10 process the first signal to provide a second signal and output 
said second signal. The first video data processor being 
arranged to process the first signal dependent on at least 
part of the received first signal. The second video data 
processor is arranged to receive at least a part of the second 

15 signal, process the at least a part of the second signal to 
provide a third signal, and output the third signal, the 
second and third signals comprising a decoded video image 
stream. The second video data processor is arranged to process 

t 

the at least part of the second signal dependent on at least 
20 part of the at least part of second signal. 
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